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Abstract. In this paper, we present the simulation of a simple, yet significantly 
powerful, sequential model by cellular automata. The simulated model is called 
CN| ' oblivious multi-head one-way finite automata and is characterised by having its 

, heads moving only forward, on a trajectory that only depends on the length of the 

1} ■ input. While the original finite automaton works in linear time, its corresponding 

£^ , cellular automaton performs the same task in real time, that is, exactly the length 

■ of the input. Although not truly a speed-up, the simulation may be interesting 

and reminds us of the open question about the equivalence of linear and real times 
on cellular automata. 
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1. Introduction 



Cellular automata (CA for short), first introduced by J. von Neumann [7j as 
self-replicating systems, are recognised as a major model of massively parallel com- 
putation since A. R. Smith, in 1969, used this Turing-complete model to compute 
functions [8]. Their simple and homogeneous description as well as their ability to 
£Sj . distribute and synchronise the information in a very efficient way contribute to their 

success. However, to determine to what extent CA can fasten sequential computa- 
CN ' tion is not a simple task. 

As regards specific sequential problems, the gain in speed by the use of CA is 
manifest [U El E]. But when we try to get general simulations, we have to face the 
delicate question of whether parallel algorithms are always faster than sequential 
ones. An inherent difficulty arises from the fact that efficient parallel algorithms 
make often use of techniques that are radically different from the sequential ones. 
There might also exist a faster CA for each singular sequential solution whereas no 
general simulation exists. 

Hence, no surprise: the known simulations of Turing machines by CA provide 
no parallel speed-up. The early construction of Smith [8] simulates one step of the 
Turing machine by one step of the CA. Furthermore, no faster simulations have 
been reported yet, even for almost all restricted variants. In particular, we do not 
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know whether any finite automata with k heads can be simulated on CA in less than 
0{n k ) steps, which is the sequential time complexity. 

We will not give answers to such issues here, but we shall examine in this context 
a simple sequential model, called oblivious multi-head finite automata. This device 
was introduced by M. Holzer in [I] as multi-head finite automata with an additional 
constraint of obliviousness: the trajectory of the heads only depends on the length 
of the input. As emphasised in [3], such finite automata lead to significant compu- 
tational power: they characterise parallel complexity NC 1 . Their properties have 
been further discussed in [5]. 

We will focus on the one-way version of this model, that is, for which the reading 
heads can only move forward (that makes it strictly less powerful). While no true 
speed-up can be hoped for, as these one-way finite automata already perform their 
task in linear time, we will describe a simulation of them by real-time CA, that is, CA 
working in linear time with a multiplicative constant equal to 1. Whereas specifying 
this constant is usually irrelevant, CA represent a particular case amongst models 
of computation, as we do not know whether linear and real times are equivalent for 
it. 

The article is organised as follows: section [2] introduces the two models consid- 
ered, section [3] displays some of their features and abilities and section @] presents 
the simulation algorithm. 

2. Definitions 

2.1. Multi-head finite automata 

Given an integer k > 1, a one-way /c-head finite automaton is a finite automaton 
reading an input word using k heads that can move to the right or stand still. 

Definition 2.1. A (deterministic) one-way multi-head finite automaton (lDFA(/c) 
for short) is a septuple (E, Q, <1, go, Q&, k, 5), where E is a finite set of input symbols 
(or letters), Q is a finite set of states, <1 ^ E is the (right) end-marker, qo G Q is 
the initial state, Q a Q Q is the set of the accepting states, k > 1 is the number 
of heads and 5 : Q x (E U {<}) fe — > Q x {0, l} k the transition function; 1 means 
to move the head one letter to the right and to keep it on its current letter. 
For the heads to be unable to move beyond the end-marker, we require that if 
5(q, di, . . . , %) = (q', m 1 , . . . , m k ), then for any i G [1, A;], = < =>- mi — 0. 

A configuration of a lDFA(fc) on an input word w G E n at a certain time t > 
is a couple (p, q) where p G [0, nj k is the position of the multi-head and q the current 
state. The computation of such a device on this input word starts with all heads on 
the first letter, and ends when all heads have reached the end-marker. If the current 
state is then within Q a , the word is said to be accepted, otherwise it is rejected. 
The language L^J 7 ) recognised by a lDFA(fc) T is the set of the words accepted by 
J 7 . One can notice a lDFA(fc) ends its computation in linear time. 

We will focus now on data-independent 1DFA (1DIDFA), a particular class of 
1DFA for which the path followed by the heads only depends on the length of the 
input word, not on the letters thereof. 
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Definition 2.2. Given k > 1, a lDFA(/i;) J 7 is said to be oblivious (or data- 
independent) if there exists a function /jr : N 2 — > N k such that the position of 
its multi-head at time t G M on any input word w is fj?(\w\,t). 

2.2. Cellular automata 

A cellular automaton is a parallel synchronous computing model consisting of 
an infinite number of finite automata called cells which are distributed on Z and 
share the same transition function, depending on the considered cell's previous state 
as well as its two neighbours'. 

Definition 2.3. A cellular automaton is a quintuple (E,Q,#,Q a ,5), where E is 
the finite set of input symbols (or letters), Q D S is the finite set of states and 
5 : Q 3 Q the transition function^. # G Q \ £ is a particular quiescent state, 
verifying <5(#, #) = #. Qa, ^ Q is the set of the accepting states. 

A configuration is a function <£ : Z — > Q. A site is a cell at a certain time step 
of the computation we consider; (c, t) will denote the state of the site (c, t) G Z x N. 
The computation of a CA C on an input word w of size n > 1 starts at time 
with all cells in state # except cells to n — 1 where the letters of the word are 
written. This is the initial configuration (£ w associated to w. Then the cells update 
in parallel their respective states according to 5: for all (c, t) G Z x N, (c, t + 1} = 
8((c-l,t),(c,t),(c+l,t)). 

This input word is accepted in time t > n if and only if cell (the origin) is 
in an accepting state at time t. The language L T (C) recognised by the automaton 
in time r : N — > N is the set of the words w it accepts in time r(|«;|). If r is the 
identity function Id, L T (C) is said to be recognised in real time. 

Real time represents for CA the most simple time complexity that is nontrivial, 
in the sense it is the minimal time required for the output to depend on all letters 
of the input. Yet, it is significantly powerful, as we do not even know whether linear 
time can achieve strictly more. Real time had already been evoked in [S]. 

3. Preliminaries 

We would like to simulate a 1DIDFA on a CA as fast as possible. A computation 
of a general 1DFA requires a number of time steps that is linear in the size of the 
input word. Whereas it is rather easy for a CA to simulate such a device in linear 
time, there is a priori no obvious way to reduce this time bound. But we can do it in 
the case of DIDFA by taking the constraint of obliviousness into account. Though, 
before performing such a simulation, we should detail some useful features of DIDFA 
and CA. 



Notice CA are defined herein with the standard neighbourhood of radius 1, that is, such that 
the state of a cell at time t + 1 depends on the states at time t of this same cell and its two nearest 
neighbours. 
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3.1. Some features of multi-head finite automata 

Let T = (£, Q, <,q ,Q a _,k,5) be a 1DIDFA, n > 1 be an integer and w G S n 
be a word of size n. Let us look at the computation of J 7 on input word w. For the 
multi-head is composed of k heads, it can be regarded as a device moving one point 
at a time in any direction within the set W = [0,n] fc . 

As T is data- independent, we can separate the path P taken by the multi-head 
from the consecutive states of the automaton (depending on the letters of w). In 
other words, we can take a look at the path of the multi-head on input word a n , for 
any a G £; it will be the same for w. Hence, the trajectory will become periodic 
after at most \Q\ moves, until one head reaches an end-marker. Then, while the 
latter head does not move any longer, after another \Q\ moves the trajectory will 
become periodic again, and so on until all heads have reached the end of the input 
word. The key points of W where a head reaches the end-marker will be useful to us 
and denoted as finite sequence (pi)ie[o,fc]> with p = (0, . . . , 0) and p^ — (n, . . . , n). 

Some notations. For convenience, we number the heads such that for all i 6 [0, A; — 
1], head i is the one that reaches the end-marker as the multi-head arrives at key 
point Pi+i. For all % e [0, k\ and all j G [0, A; — 1], we denote the (j + l)-th coordinate 
of pi by pi j, and if i < k name Pi C P the portion of trajectory that lies between pi 
and p i+1 . 



/'ii 




Figure 1: A representation of W for k = 3. The periodic parts of the path of the 
multi-head are drawn in black. 



3.2. A few basic techniques on cellular automata 

A given computation of a CA can be easily represented by drawing successive 
configurations each one above its predecessor. We thus obtain a space-time diagram, 
composed of sites, of which we only need to represent those in a non-quiescent state. 

We will often have to perform several rather independent computations at the 
same time; this can easily be done by a 'product' automaton which works with a 
finite number of layers, each one of which supports a specific computation. Although 
rather independent, the layers can communicate between one another to exchange 
information, as any cell can see all of them. 
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Compression of the input word. In section SJ we will need to compress the input by 
some rational factor p > 2. This is easy to do with a CA. It consists in having the 
input word written on the (discrete) straight line of equation t — 1 = (p — l)(c+ 1), 
where t represents the time and c a cell, as shown on fig. [2j As the concerned sites 
'know' that they lie on this straight line, a computation using the compressed input 
word can then occur within the triangle of real time (in light grey on fig. 12} . 



Acceleration by a constant. For any constant T e N and any CA C, there exists a 
CA C such that Lh(C') = Lh+t(C). In other words, to prove that a given language 
is CA-recognisable in real time, it suffices to exhibit a CA recognising it in time 
Id + T. For more details, one can refer to [6]. 



t-l = (p-l)(c+l) 




T ' 



oooooooooo oooooooooo 

n - 1 n-1 

p = 2 p = 5/2 

Figure 2: Schematic space-time diagrams during which input word w is compressed 
by rational factor p. Each sequence of linked dots represent a letter of 
w. The sites containing the compressed version of w are encircled. Notice 
that even though it seems the letters could be shifted one time step earlier, 
this first step is in fact used to mark the last letter; it is necessary because 
of rounding issues. 



4. Simulation 

Theorem 4.1. Given k > 1, for any lDIDFA(k) J 7 recognising a language C, there 
exists a CA C recognising C in real time. 

The rest of this paper will be devoted to the proof of this theorem. We assume 
now that we have a YDYD¥A{k) J 7 = (S, Q, <, qo, Q a , k, 8). We will define a CA 
C = (S, Q', Q' a , 5') such that Lh(C) = C Instead of giving the full description of 
its state set and transition function, we will describe its behaviour on an arbitrary 
input word w G S n , given an integer n > 1. Within this coming description (and 
similarly in the whole article) the terms 'constant' and 'finite' refer to quantities 
that do not depend on n. 
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4.1. Principle 

The general principle of the simulation is rather simple: instead of having k heads 
moving along w, we will have (at least) k copies of w shifted over a segment S of 
sites (of strictly increasing time steps) so that each site sees the correct letters of w. 
Moreover, the letters for each head will be seen in reverse order compared to what 
J 7 does. 

Each part Pi of the trajectory of the multi-head can be assimilated to a discrete 
straight line, with no aperiodic part. Indeed, as illustrated in fig. [3j the distance 
(in letters) between any point of P, and the point of this line corresponding to same 
time step is bounded by some value K = 0(\Q\). Thus, during the execution of C 
over w, before doing anything, all cells bearing the input will gather the letters of 
their K nearest neighbours. This is done in time K. 



Figure 3: Pi lies within a band of width 0(|Q|), here drawn in white. It can hence 



be assimilated to a (discrete) straight line, provided a counter (within the 
shifting copies of w during the execution of C) indicates for each point 
of this line the corresponding position within the period of Pj. Notice 
that although the band can broaden as i increases, this index only rises 
up to a constant value, so that the maximal width K remains bounded 
independently of the size of the input. 



4.2. Key sites 

We will set S = {(c, n — 1 — c + T) : c G [0, n — lj}, where T, which is to be 
defined (cf. subsection 14. 3p . is an integer greater than K that does not depend on n. 
The result of the execution is to appear on site sq = (0, n — 1 + T). To know which 
speed the copies of w should be shifted at over each site of S, the latter segment 
should be divided into parts Si, each one of which corresponds to part Pi of P. In 
other words, we want to mark some key sites Si = (ci,n — 1 — q + T) G S that 
represent key points Pi G P. The main difficulty is that key cell Cj has to represent 
coordinate p it j for any head j. 

For this purpose, we observe first that for all (i, j) G [0, kj x [0, k — 1], since each 
part of P is as illustrated in fig. El there exists a it j G Qfl[0, 1] such that \Pi,j — (*i,jn\ < 
K, whatever the size n of the input. One can notice that we automatically have 
ao,j = and a^j = 1 for all % > j, and that (atij)i is an increasing sequence for all j. 

Then, we provisionally assume that ctjj = j = 0, and set key cell q = \ptin\ , 
where ccj = | YljZi a j,j- ^ ne case wherein there exists some j that does not verify 
this hypothesis will be treated in subsection 14.61 



0(\Q\) 



Pi+ 




0(\Q\) 
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Now, how to mark site s{! No trouble if i = 0, as Cq is the origin. If i > 0, it 
is also feasible: it suffices to send a signal from the origin at speed q = < 1 
(cf. fig. H]). Note that in the definition of «j, we have divided by 2 in case some key 
cells would be too far from the origin to be marked in time (in CA configurations, 
information cannot travel at speed of absolute value strictly greater than 1). All 
our computation has hence to be performed within half as much space than what 
S provides. In any case, the definition of ctij is based on the assumption that the 
copies of the input shifting over Si are compressed versions of w. 




ci c 2 c 3 n-1 

Figure 4: Schematic space-time diagram of the marking of cells q, for k = 3. 

Notice that (cj)j = (L«jnJ)j is always an increasing sequence (since 
a.i = aijaii+i < a i+ i), with c = and c k = [|J- 

4.3. Compression of the input 

For each i £ [1, kj, we want to compress input word w (on a specific layer 
corresponding to head i — 1) by factor ^- as illustrated in fig. HI that is, on some 
straight line Di of direction vector (1, — ) = (1, — — 1). One can notice we are able to 
choose Di such that it crosses the origin at any time t > \_^\. Thus, we will make all 
such lines cross the origin at the same time T £ N. As is a decreasing sequence 
and as we have done some computations in time K beforehand, we set T = K + [^-J . 
Hence, we have finally set Di to be the line of equation t — T = — (cf. fig. HI). 

4.4. Shift of the input 

Consider some head j £ [0, k — 1] and an integer % £ On layer £j, 

which corresponds to this head, we want to shift the compressed input at some 
constant speed £] — between D i+ i and Di, so that the correct letters pass 
over Si. One can notice = by the definition of otj+i and Oj. But this not 
necessarily the case when i < j. Indeed, ^ should be defined as equal to 1 _^ J , 
with & ~ = di — oti jf a ■ — cc i j > and & ,• = ctj otherwise. This way, 

is the speed of the signal we would use to mark cell = |_A,j n J (°f- fig- E])- 
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Figure 5: Different compressed copies of the input shifted over S, with the trajectory 
of the letter initially contained by some cell c displayed. Layer £j corre- 
sponds to head j < k = 3. In this example, we have ( 0:1,1, 0:1,2, 0:2,2) = 
(|> i !)■ Hence, (a , «i, a 2 , a 3 ) = (0, |§, |, §) and #l j2 = ^5. 

4.5. Backtracking 

Now that we have ensured the correct letters are seen in reverse order for each 
head on each segment Si, how do we get site So to know the result of the execution 
of jF over wl All we need to know is whether the final state of J 7 is accepting, that 
is, belongs to Q a - 

Let p be a point of P such that p 7^ p . One can observe that if we know q, the 
state J 7 is in when its multi-head is on p, as well as the letter lj £ S each head j 
reads when the multi-head lies on the predecessor p' of p, then we can compute 
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the possible states of J 7 at point p' . That is, the subset Q' of Q such that for all 
q' G Q, 5(q', l , . . . , lk-i) = (q,P — T>') ^ °l £ Q' ■ Likewise, if we know J 7 is in a 
state of Q" C Q at point p, we can determine the subset Q' such that for all q' E Q, 
5(q', /o, • • • , ife-i) G Q" x {p — p'} -v=> g' G Q'. We will refer to this process as reading 
5 backward. 

Let then s = (c + l,t — 1) be a site of S. As the letters it sees come from 
compressed versions of to, it can represent a (finite) range of points of P instead 
of only one, depending on the part Si it belongs to. Now suppose it contains some 
subset of Q for each of the successive points of P it represents. Suppose also these 
subsets are consistent with one another (regarded as the possible states J 7 is in at 
each of these points). Then successor site s' = (c,t) can read S backward a finite 
(but sufficient) number of times to get the possible subsets of its own points. 

Site Sk represents the last points of P, amongst which the very last point pk- 
So, we initiate our 'reverse' computation by setting the state of Sk (on some layer £ 
on which this computation is to be held) to contain subset Q a for point pk and 
consistent ones for the predecessors it represents. By induction, every element of 
S will contain subsets that are consistent with Q a on layer £. In particular, s will 
have the corresponding subset Q for p , so that it just has to check whether q G Qo 
to know if w is accepted by T '. m 

4.6. Adjustments 

In the preceding construction, we have put some details or particular cases aside. 
First, we have to mention that the whole process obviously works only for input 
words of size greater than some value depending on K (for all Pi to be assimilated 
to straight lines as in fig. |3]). Nevertheless, that leaves us a finite number of words 
that are treated as special cases, so that the result is not affected. 

Possibilities. As each Pj is not a real straight line, the next part Pi+\ of the path 
depends on which point of the period of Pi the multi-head is at (that is, which state 
it is in over word a n ) when head i reaches the end-marker. In particular, there 
can be at most \Q\ possible values a?j, depending on n. Anyway, that makes a finite 
number of possible (k — l)-tuples («x, . . . , ak-i), and we can thus process all of them 
in parallel. 

Remains to elect the right tuple at site sq or before. It can be done by looking at 
the remainder of the Euclidean division of n — | Q\ by some finite value / ( | Q \ ) . That 
can be easily checked, for instance, on line Dk with a finite counter. The choice will 
be known at site Sk and spread toward s . 

Aperiodic parts. It may seem we know at any site along any Si, thanks to what 
precedes, which points of the period of Pi we are simulating and so, which available 
letters the cell has to use. This is in fact not true yet: when reaching site s,, we 
have to take the aperiodic part of Pi into account, and therefore we must be able to 
modify the last \Q\ moves (that is, to adjust the choice of letters) we have simulated 
backward. That can be done by adding to the sites of S a finite memory of the 
letters seen. 
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Immobile heads. Suppose that, contrary to the hypothesis made in subsection 14.21 
there exists some j > such that aijj = 0. That means that head j remains 
motionless until pj and then covers the totality of the input during Pj. The trouble 
is that it implies for all i < j, = 0. Therefore, Sj = Sj-i = ■ ■ ■ = sq, so that a 
linear number of moves would have to be simulated on a single site. 

A simple trick allows us to overcome this problem: for all j 6 [1, k — 1], we set 
a'jj = otjj if otjj > and a'- a = \ otherwise^!, and set a' = ao,o = 0. Then, in our 
construction, we replace any ajj by a'j •. 

Finally, for each j > verifying atjj = 0, we still have to adjust shift speed <^j, 
which is equal to 0. All we have to do is to replace it by d • = 7^77 ( om y f° r this j), 
which makes the totality of the copy of w on layer £j shift over Sj. As regards indices 
i < j, we do not need to redefine the corresponding speed Qj, since head j makes 
no more moves. 

Conclusion 

We have described a construction that simulates oblivious multi-head one-way 
finite automata on real-time cellular automata. This is better (if linear and real 
times are not equivalent) than what would achieve the naive (though nontrivial) 
simulation of general multi-head finite automata, which would result in a linear- 
time CA. 

In any case, this result fully exploits the obliviousness of the sequential compu- 
tation. Now, it is another challenge to get a similar parallel algorithm without the 
constraint of data-independence. 

Acknowledgement 

I would like to thank G. Richard and V. Terrier for introducing me to the matter 
of DIDFA (which resulted in a common article about a speed-up of two-way DIDFA 
by CA). I would also like to thank J. Ferte for useful brainstorming sessions before 
the blackboard and V. Poupet for his help. 

References 

[1] A. J. Atrubin. A one-dimensional real-time iterative multiplier. IEEE Transactions on Elec- 
tronic Computers, 14(l):394-399, 1965. 

[2] Stephen N. Cole. Real-time computation by n-dimensional iterative arrays of finite-state ma- 
chines. IEEE Trans. Corn-put., 18(4):349-365, 4969. 

[3] Karel Cuhk II. Variations of the firing squad problem and applications. Information Processing 
Letters, 30(3):152-157, 1989. 

[4] Markus Holzcr. Multi-head finite automata: Data-independent versus data-dependent compu- 
tations. Theoretical Computer Science, 286(1):97-116, 2002. 

[5] Markus Holzer, Martin Kutrib, and Andreas Malcher. Multi-head finite automata: Charac- 
terizations, concepts and open problems. In Turlough Neary, Damien Woods, Anthony Karel 
Seda, and Niall Murphy, editors, The Complexity of Simple Programs (CSP'08), EPTCS, pages 
93-107, 2008. 

[6] Jacques Mazoyer and Nicolas Reimen. A linear speed-up theorem for cellular automata. Theo- 
retical Computer Science, 101(l):59-98, 1992. 



Notice we could have chosen any rational value strictly between and 1 instead of |. 



FROM 1DIDFA TO REAL-TIME CA 75 



[7] John von Neumann. Theory of Self- Reproducing Automata. University of Illinois Press, Urbana, 
IL, USA, 1966. 

[8] Alvy R. Smith III. Simple computation- universal cellular spaces. Journal of the ACM, 
18(3):339-353, 1971. 



This work is licensed under the Creative Commons Attribution- 
NoDerivs License. To view a copy of this license, visit 

http : //creativecommons . org/licenses/by-nd/3 . 0/ 



